Corum Province
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.05)
- North America > United States > District of Columbia > Washington (0.05)
- Europe > Germany (0.05)
- (8 more...)
- Media > News (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
Multi-context principal component analysis
Wang, Kexin, Bhate, Salil, Pereira, João M., Kileel, Joe, Figlerowicz, Matylda, Seigal, Anna
Principal component analysis (PCA) is a tool to capture factors that explain variation in data. Across domains, data are now collected across multiple contexts (for example, individuals with different diseases, cells of different types, or words across texts). While the factors explaining variation in data are undoubtedly shared across subsets of contexts, no tools currently exist to systematically recover such factors. We develop multi-context principal component analysis (MCPCA), a theoretical and algorithmic framework that decomposes data into factors shared across subsets of contexts. Applied to gene expression, MCPCA reveals axes of variation shared across subsets of cancer types and an axis whose variability in tumor cells, but not mean, is associated with lung cancer progression. Applied to contextualized word embeddings from language models, MCPCA maps stages of a debate on human nature, revealing a discussion between science and fiction over decades. These axes are not found by combining data across contexts or by restricting to individual contexts. MCPCA is a principled generalization of PCA to address the challenge of understanding factors underlying data across contexts.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Republic of Türkiye > Corum Province > Corum (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (5 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
VCWorld: A Biological World Model for Virtual Cell Simulation
Wei, Zhijian, Ma, Runze, Wang, Zichen, Li, Zhongmin, Song, Shuotong, Zheng, Shuangjia
Virtual cell modeling aims to predict cellular responses to perturbations. Existing virtual cell models rely heavily on large-scale single-cell datasets, learning explicit mappings between gene expression and perturbations. Although recent models attempt to incorporate multi-source biological information, their generalization remains constrained by data quality, coverage, and batch effects. More critically, these models often function as black boxes, offering predictions without interpretability or consistency with biological principles, which undermines their credibility in scientific research. To address these challenges, we present VCWorld, a cell-level white-box simulator that integrates structured biological knowledge with the iterative reasoning capabilities of large language models to instantiate a biological world model. VCWorld operates in a data-efficient manner to reproduce perturbation-induced signaling cascades and generates interpretable, stepwise predictions alongside explicit mechanistic hypotheses. In drug perturbation benchmarks, VCWorld achieves state-of-the-art predictive performance, and the inferred mechanistic pathways are consistent with publicly available biological evidence.
- Asia > Middle East > Republic of Türkiye > Corum Province > Corum (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- (3 more...)
- Europe > Ukraine (0.15)
- South America > Venezuela (0.05)
- North America > United States > Tennessee (0.05)
- (6 more...)
- Media (1.00)
- Leisure & Entertainment > Sports (1.00)
- Law (1.00)
- (2 more...)
- Africa > Nigeria (0.68)
- South America > Venezuela (0.05)
- North America > United States > New York (0.05)
- (6 more...)
Hypothesis Hunting with Evolving Networks of Autonomous Scientific Agents
Liu, Tennison, Estévez, Silas Ruhrberg, Bentley, David L., van der Schaar, Mihaela
Large-scale scientific datasets -- spanning health biobanks, cell atlases, Earth reanalyses, and more -- create opportunities for exploratory discovery unconstrained by specific research questions. We term this process hypothesis hunting: the cumulative search for insight through sustained exploration across vast and complex hypothesis spaces. To support it, we introduce AScience, a framework modeling discovery as the interaction of agents, networks, and evaluation norms, and implement it as ASCollab, a distributed system of LLM-based research agents with heterogeneous behaviors. These agents self-organize into evolving networks, continually producing and peer-reviewing findings under shared standards of evaluation. Experiments show that such social dynamics enable the accumulation of expert-rated results along the diversity-quality-novelty frontier, including rediscoveries of established biomarkers, extensions of known pathways, and proposals of new therapeutic targets. While wet-lab validation remains indispensable, our experiments on cancer cohorts demonstrate that socially structured, agentic networks can sustain exploratory hypothesis hunting at scale.
- Asia > Middle East > Republic of Türkiye > Corum Province > Corum (0.05)
- North America > United States > Colorado (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Nephrology (0.93)
- Materials > Chemicals (0.92)
- Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
CellCLIP -- Learning Perturbation Effects in Cell Painting via Text-Guided Contrastive Learning
Lu, Mingyu, Weinberger, Ethan, Kim, Chanwoo, Lee, Su-In
High-content screening (HCS) assays based on high-throughput microscopy techniques such as Cell Painting have enabled the interrogation of cells' morphological responses to perturbations at an unprecedented scale. The collection of such data promises to facilitate a better understanding of the relationships between different perturbations and their effects on cellular state. Towards achieving this goal, recent advances in cross-modal contrastive learning could, in theory, be leveraged to learn a unified latent space that aligns perturbations with their corresponding morphological effects. However, the application of such methods to HCS data is not straightforward due to substantial differences in the semantics of Cell Painting images compared to natural images, and the difficulty of representing different classes of perturbations (e.g., small molecule vs CRISPR gene knockout) in a single latent space. In response to these challenges, here we introduce CellCLIP, a cross-modal contrastive learning framework for HCS data. CellCLIP leverages pre-trained image encoders coupled with a novel channel encoding scheme to better capture relationships between different microscopy channels in image embeddings, along with natural language encoders for representing perturbations. Our framework outperforms current open-source models, demonstrating the best performance in both cross-modal retrieval and biologically meaningful downstream tasks while also achieving significant reductions in computation time.
- Asia > Middle East > Republic of Türkiye > Corum Province > Corum (0.05)
- North America > United States > Gulf of Mexico > Central GOM (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > Utah (0.04)
- North America > United States > New York (0.04)
- (8 more...)
- Media (1.00)
- Leisure & Entertainment > Sports (1.00)
- Health & Medicine (1.00)
- (3 more...)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.86)
- Information Technology > Communications > Social Media (0.74)
Gene-R1: Reasoning with Data-Augmented Lightweight LLMs for Gene Set Analysis
Wang, Zhizheng, Yang, Yifan, Jin, Qiao, Lu, Zhiyong
The gene set analysis (GSA) is a foundational approach for uncovering the molecular functions associated with a group of genes. Recently, LLM-powered methods have emerged to annotate gene sets with biological functions together with coherent explanatory insights. However, existing studies primarily focus on proprietary models, which have been shown to outperform their open-source counterparts despite concerns over cost and data privacy. Furthermore, no research has investigated the application of advanced reasoning strategies to the GSA task. To address this gap, we introduce Gene-R1, a data-augmented learning framework that equips lightweight and open-source LLMs with step-by-step reasoning capabilities tailored to GSA. Experiments on 1,508 in-distribution gene sets demonstrate that Gene-R1 achieves substantial performance gains, matching commercial LLMs. On 106 out-of-distribution gene sets, Gene-R1 performs comparably to both commercial and large-scale LLMs, exhibiting robust generalizability across diverse gene sources.
- North America > United States > Maryland > Montgomery County > Bethesda (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Asia > Middle East > Republic of Türkiye > Corum Province > Corum (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Universal Deep Research: Bring Your Own Model and Strategy
Belcak, Peter, Molchanov, Pavlo
Deep research tools are among the most impactful and most commonly encountered agentic systems today. We observe, however, that each deep research agent introduced so far is hard-coded to carry out a particular research strategy using a fixed choice of tools. We introduce Universal Deep Research (UDR), a generalist agentic system that wraps around any language model and enables the user to create, edit, and refine their own entirely custom deep research strategies without any need for additional training or finetuning. To showcase the generality of our system, we equip UDR with example minimal, expansive, and intensive research strategies, and provide a user interface to facilitate experimentation with the system.
- Asia > India > Maharashtra (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > South Korea (0.04)
- (2 more...)
- Research Report (0.82)
- Workflow (0.67)